Novel filler acoustic models for connected digit recognition
نویسندگان
چکیده
The context-dependent modeling technique is extended to include non-speech ller segments occurring between speech word units. In addition to the conventional context-dependent word or subword units, the proposed acoustic modeling provides an e cient way of accounting for the effects of the surrounding speech on the inter-word non-speech segments, especially for small vocabulary recognition tasks. It is argued that a robust recognition scheme is obtained by explicitly accounting for context-dependent inter-word ller acoustics in training while ignoring their explicit context dependencies during recognition testing. Results on a connected digit recognition task over the telephone network indicate an improvement in the error rate from 2.5% to 0.9% i.e., about 64% word error-rate reduction, using the improved model set.
منابع مشابه
Discriminative Utterance Verification For Connected Digits Recognition - Speech and Audio Processing, IEEE Transactions on
Utterance verification represents an important technology in the design of user-friendly speech recognition systems. It involves the recognition of keyword strings and the rejection of nonkeyword strings. This paper describes a hidden Markov model-based (HMM-based) utterance verification system using the framework of statistical hypothesis testing. The two major issues on how to design keyword ...
متن کاملNatural number recognition using MCE trained inter-word context dependent acoustic models
Among applications that require number recognition, the focus has largely been on connected digit recognizers. In this paper, we introduce an acoustic model topology for natural number recognition by using minimum classification error (MCE) training of inter-word context dependent models of the head-body-tail (HBT) type. Experimental results on natural number applications involving dollar amoun...
متن کاملConnected Digit Recognition with Class Specific Word Models
This work focuses on efficient use of the training material by selecting the optimal set of model topologies. We do this by training multiple word models of each word class, based on a subclassification according to a priori knowledge of the training material. We will examine classification criteria with respect to duration of the word, gender of the speaker, position of the word in the utteran...
متن کاملUsing Duration Information in Cantonese Connected-Digit Recognition
This paper presents an investigation on the use of explicit statistical duration models for Cantonese connected-digit recognition. Cantonese is a major Chinese dialect. The phonetic compositions of Cantonese digits are generally very simple. Some of them contain only a single vowel or nasal segment. This makes it difficult to attain high accuracy in the automatic recognition of Cantonese digit ...
متن کاملUnified acoustic modeling for continuous speech recognition
Usually the speech and the silence models are trained together depending upon the type of recognition task. For example, if the recognition task is only on connected-digits then the corresponding digit models are built using only the connected-digit training corpus. Similarly for large-vocabulary recognition tasks, the subword or the phoneme models are generated using only the subword training ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997